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ABSTRACT 

We  report  on  the  University  of  Lugano’s  participation  in 
the  Blog  track  of  TREC  2008.  In  particular  we  describe  our 
system  for  performing  opinion  retrieval  and  blog  distillation. 

1.  INTRODUCTION 

The  2008  Blog  track  continued  on  from  the  successful  2007 
Blog  track  [12],  including  the  same  opinion  retrieval  and  blog 
distillation  activities.  This  year  was  our  first  participation 
in  TREC  and  we  participated  in  both  opinion  retrieval  and 
blog  distillation  tasks.  We  aimed  to  test  the  effectiveness 
of  learning  methods  in  each  of  these  tasks.  In  the  topic  re¬ 
trieval  phase  (baseline)  of  the  opinion  retrieval  task,  we  used 
a  rank  learning  method  [18]  to  combine  additional  informa¬ 
tion  including  the  content  of  incoming  hyperlinks  and  tag 
data  from  social  bookmarking  websites  with  our  basic  re¬ 
trieval  method  which  was  the  Divergence  from  Randomness 
version  of  BM25  (DFRJ3M25)  [2].  The  results  shows  20% 
improvement  in  the  Mean  Average  Precision  (MAP)  of  the 
proposed  method  in  comparison  with  DFR_BM25.  We  then 
examined  the  effectiveness  of  learning  methods  in  assigning 
opinion  scores  to  documents.  We  compared  a  Support  Vec¬ 
tor  Machine  (SVM)  based  learning  system  with  a  simpler 
system  that  used  the  average  opinionatedness  of  each  word 
in  the  document.  Although  the  results  were  not  satisfactory 
in  our  TREC  submission  and  we  didn’t  improve  the  baseline, 
repeating  the  experiments  showed  the  improvement  over  the 
baseline  by  using  the  learning  methods  in  document  opinion 
scoring. 

In  the  distillation  task  we  try  to  make  use  of  link  informa¬ 
tion  for  blog  distillation.  We  first  implemented  a  simple  blog 
search  method  based  on  the  voting  model  used  in  the  expert 
search  problem  [11].  This  baseline  seems  to  perform  better 
than  the  median  of  the  TREC’08  participants.  We  then 
tried  to  take  into  account  structure-based  evidence  using  a 
rank  learning  approach.  Unfortunately,  the  rank  learning 
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model  appears  to  be  very  sensitive  to  properties  of  the  data 
set,  and  did  not  perform  well  in  our  experiments. 

The  remainder  of  this  paper  is  structured  as  fallows.  We 
describe  the  method  used  in  our  baseline  submission  in  sec¬ 
tion  2.  In  section  3  we  discuss  about  our  approach  in  opinion 
retrieval  task.  The  distillation  task  is  discussed  in  detail  in 
section  4. 

2.  BASELINE  RELEVANCE  RETRIEVAL 

For  the  baseline  blog  post  retrieval  task,  we  built  a  system 
that  used  a  rank  learning  framework  to  combine  relevance 
scores  for  each  post  with  other  forms  of  evidence.  We  first 
used  the  Terrier  Information  Retrieval  system  [14]  to  index 
the  blog  post  (permalink)  collection.  We  then  tested  various 
state-of-the-art  retrieval  models  including  BM25  [15],  Di¬ 
vergence  from  Randomness  [3]  and  Language  Modeling  [19] , 
for  generating  relevance  scores  for  individual  posts.  The 
TREC  2007  relevance  assessments  were  used  for  the  evalua¬ 
tion.  The  result  of  our  analysis  was  that  DFR_BM25  [2]  pro¬ 
duced  the  best  result,  with  Mean  Average  Precision  (MAP) 
of  0.2138. 

We  then  extend  this  content-based  retrieval  technique 
with  additional  information  including  the  content  of  incom¬ 
ing  hyperlinks  and  tag  data  from  social  bookmarking  web¬ 
sites.  The  latter  has  recently  been  shown  to  be  useful  for 
improving  Web  Search  [9,  4,  17].  In  order  to  incorporate 
these  additional  sources  of  evidence  we  rely  on  a  rank  learn¬ 
ing  approach  [18,  5]. 

We  trained  an  SVM-map  [18]  rank  learner  to  optimally 
combine  different  forms  of  evidence  into  a  single  retrieval 
function.  The  different  forms  of  investigated  evidence  were: 

•  the  post  content  relevance  score 

•  the  inlink  count,  i.e.  the  number  of  incoming  hyper¬ 
links  form  other  permalinks  in  the  collection 

•  the  cosine  similarity  between  inlink  anchor-text  and 
the  query 

•  the  query  length 

•  the  popularity  of  the  domain1  (hostname)  of  the  URL 
of  the  post  on  the  social  bookmarking  (tagging)  web¬ 
site  Delicious2 

1The  domain  of  the  permalink  URL  was  used  instead  of  the 
URL  itself  because  of  the  sparsity  of  data  on  Delicious. 
2http :  / / delicious  .  com 
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Run 

MAP 

R-Precision 

P@10 

DFRJ3M25 

runO 

0.2138 

0.2663 

0.3836 

0.2780 

0.4087 

0.4273 

Table  1:  Topic- Relevance  results  for  submitted  base¬ 
line. 

•  a  tag-query  similarity  score  based  on  the  relative  fre¬ 
quency  with  which  query  terms  are  used  to  annotate 
the  permalink’s  domain  on  Delicious. 

Table  1  shows  the  topic  relevance  results  of  the 
DFRJ3M25  method  and  our  baseline  system  (runO).  The 
results  indicate  that  using  a  rank  learning  approach  to  in¬ 
clude  additional  information  such  as  the  content  of  incoming 
hyperlinks  and  tag  data  from  social  bookmarking  websites, 
can  boost  the  performance  of  a  baseline  content-based  re¬ 
trieval  system. 

3.  OPINION  RETRIEVAL 

Our  approach  to  ranking  blog  posts  by  their  opinionated¬ 
ness  again  relies  on  a  learning  framework.  In  this  case  we 
train  a  Learning  to  Rank  system  to  take  both  a  relevance 
score  (output  by  the  rank  learner  described  above)  and  an 
“opinion  score”  for  each  document  into  account  when  pro¬ 
ducing  an  output  ranking.  The  advantage  of  this  approach 
is  that  we  do  not  need  to  decide  explicitly  how  best  to  com¬ 
bine  these  forms  of  evidence,  but  can  rely  on  historical  data 
for  fine  tuning  the  retrieval. 

The  problem  then  is  to  estimate  a  score  for  the  “opinion¬ 
atedness”  of  each  document.  We  have  two  approaches  to 
doing  this.  In  the  first  approach  we  calculate  an  opinion 
score  for  each  term  in  the  document  and  then  combine  the 
score  over  all  terms  in  the  document.  In  the  second  we  train 
a  classification  system  to  distinguish  between  opinionated 
and  non-opinionated  posts.  We  then  use  the  confidence  of 
the  classifier  as  an  opinion  score  for  the  document. 

To  better  describe  these  techniques,  we  introduce  some 
notation.  Assume  that  we  have  a  set  of  labeled  training 
documents,  denoted  D,  and  a  set  of  training  queries,  de¬ 
noted  q\ . ....  qn.  For  each  query  qi  we  have  a  set  of  relevant 
documents  Ri  C  D,  and  a  set  of  opinionated  documents 
Oi  C  Ri,  that  were  judged  by  assessors  to  be  relevant  and 
opinionated  respectively.  Let  O  =  U iOi  be  the  set  of  all 
opinionated  documents  in  our  training  set  and  R  =  UiRi 
be  the  set  of  all  relevant  documents,  (note  that  O  C  R). 
The  relative  frequency  of  a  particular  term  t  in  the  set  O  is 
denoted  p(t\0)  and  calculated  as: 

P(t\o)  =  Eif°c(|’d)  (1) 

Z^dgo  \a\ 

where  c(f,  d )  denotes  the  number  of  occurrences  term  t  in 
document  d  and  \d\  denotes  the  total  number  of  words  in 
the  document.  We  can  now  calculate  an  opinion  score  for 
each  term  t.  using  the  technique  proposed  by  Amati  et  al.  [1] 
as  follows: 

opinion(t)  =  p{t\0)  log  (2) 

Note  that  the  summation  over  all  terms  of  the  opinion  score 
gives  the  well-known  Kullback-Leibler  divergence  [13]  be¬ 
tween  the  opinionated  document  set  and  the  relevant  doc- 


Run 

MAP 

R-Precision 

bPref 

P@10 

OurBaseline 

0.2663 

0.3478 

0.3799 

0.4273 

opinOkl 

0.2080 

0.2597 

0.3284 

0.4040 

opinOsvm 

0.2075 

0.2698 

0.3325 

0.3547 

Baselinel 

0.3701 

0.4156 

0.4501 

0.7307 

opinlkl 

0.3662 

0.4136 

0.4468 

0.7187 

opinlsvm 

0.2936 

0.3569 

0.4102 

0.4947 

Table  2:  Topic-Relevance  results  for  submitted  runs. 


Run 

MAP 

R-Precision 

bPref 

P@10 

OurBaseline 

0.1902 

0.2429 

0.2566 

0.2920 

opinOkl 

0.1525 

0.1986 

0.2263 

0.2967 

opinOsvm 

0.1797 

0.2372 

0.2756 

0.2873 

Baselinel 

0.2639 

0.3189 

0.3170 

0.4753 

opinlkl 

0.2626 

0.3178 

0.3144 

0.4760 

opinlsvm 

0.2511 

0.3136 

0.3312 

0.4100 

Table  3:  Opinion  Retrieval  results  for  submitted 
runs. 

ument  set.  This  measure  quantifies  the  dissimilarity  be¬ 
tween  the  two  sets  of  documents.  Terms  which  cause  high 
divergence  are  therefore  good  indicators  of  opinionatedness. 
In  order  to  calculate  an  opinion  score  for  an  entire  docu¬ 
ment,  we  can  simply  calculate  the  expected  opinionatedness 
of  words  in  document: 

opinionl(d)  =  opinion(t)p(t\d)  (3) 

ted 

where  p(t\d)  is  the  relative  frequency  of  term  t  in  d. 

Alternatively,  as  stated  previously,  we  can  train  a  classi¬ 
fication  system  and  in  particular  a  Support  Vector  Machine 
(SVM)  to  recognize  opinionated  documents.  We  can  then 
use  the  confidence  of  the  classifier  (i.e.  the  distance  from  the 
hyperplane)  as  the  opinion  score  for  each  document.  The  per 
term  opinion  score  is  used  in  this  case  only  for  feature  se¬ 
lection,  with  only  the  top  1,000  “most  opinionated”  terms 
being  used  as  features  for  the  classifier: 

opinion2(d)  =  fsvM({p(ti\d),  ...,p(tm\d)))  (4) 

where  the  function  /svmO  is  the  output  (confidence)  of  the 
trained  SVM  for  a  particular  document  and  m  is  the  size  of 
the  feature  set  (vocabulary). 

For  our  TREC  2008  participation  we  trained  a  Learning 
to  Rank  system  to  rank  results  by  combining  our  relevance 
score  for  the  document  with  one  of  the  two  different  opin¬ 
ion  scores  described  above.  Table  3  shows  the  opinion  re¬ 
trieval  results  of  the  two  proposed  methods  using  our  base- 
line(opin0kl,  opinOsvm)  and  one  of  the  TREC  2008  baselines 
(opinlkl,  opinlsvm).  OpinOkl  is  the  run  in  which  we  used 
our  own  baseline  to  find  the  relevant  set  of  documents.  In 
order  to  calculate  the  opinion  score  for  documents  we  used 
the  expected  opinionatedness  of  words  in  document  as  de¬ 
scribed  before.  In  opinOsvm  we  also  used  our  baseline,  but 
we  used  the  confidence  of  the  trained  SVM  as  an  opinion 
score  for  documents.  Opinlkl  is  the  run  in  which  we  used 
the  TREC  baselinel  to  get  the  list  of  relevant  documents 
and  then  we  used  the  expected  opinionatedness  approach  to 
find  the  opinion  score  of  the  documents.  Opinlsvm  used 
TREC  baselinel  and  the  SVM  score  approach.  Although 


we  couldn’t  improve  the  baselines,  comparing  opinOkl  and 
opinOsvm,  shows  that  using  the  SVM  confidence  as  an  opin¬ 
ion  score  works  better  than  the  expected  opinionatedness  ap¬ 
proach  on  our  baseline,  but  the  reverse  is  true  for  the  TREC 
baseline.  After  the  TREC  submissions,  we  repeated  the  ex¬ 
periments  performing  a  more  comprehensive  study  and  us¬ 
ing  10-fold  cross-validation.  We  discovered  that  even  for  the 
TREC  baselines,  using  an  SVM  to  calculate  opinion  scores 
for  documents  works  better  than  the  expected  opinionated¬ 
ness  approach.  The  results  showed  about  17%  improvement 
over  TREC  baselinel. 

A  distinct  advantage  of  our  “multiple  levels  of  learning” 
approach  is  that  we  maximize  the  use  of  available  training 
data  and  thus  do  not  need  to  rely  on  external  sources  of 
opinion-bearing  word  lists,  that  may  not  be  well-suited  to 
the  blog  opinion  retrieval  task. 

4.  BLOG  DISTILLATION 

Blog  search  users  often  wish  to  identify  blogs  about  a  given 
topic  so  that  they  can  subscribe  to  them  and  read  them  on  a 
regular  basis  [12].  The  blog  distillation  task  can  be  defined 
as:  “Find  me  a  blog  with  a  principle,  recurring  interest  in  the 
topic  X.”  Systems  should  suggest  feeds  that  are  principally 
devoted  to  the  topic  over  the  timespan  of  the  feed,  and  would 
be  recommended  to  a  user  as  an  interesting  feed  about  the 
topic  (i.e  a  user  may  be  interested  in  adding  it  to  their  RSS 
reader) . 

The  blog  distillation  task  has  been  approached  from  many 
different  points  of  view.  In  [6] ,  the  authors  view  it  as  ad-hoc 
search  and  consider  each  blog  as  a  long  document  created 
by  concatenating  all  postings  together.  Other  researchers 
treat  it  as  the  resource  ranking  problem  in  federated  search 
[7].  They  view  the  blog  search  problem  as  the  task  of  rank¬ 
ing  collections  of  blog  posts  rather  than  single  documents. 
A  similar  approach  has  been  used  in  [16],  where  they  again 
consider  a  blog  as  a  collection  of  postings  and  use  resource 
selection  approaches.  Their  intuition  is  that  finding  relevant 
blogs  is  similar  to  finding  relevant  collections  in  a  distributed 
search  environment.  In  [12],  the  authors  modelled  blog  dis¬ 
tillation  as  an  expert  search  problem  and  use  a  voting  model 
for  tackling  it. 

Our  intuition  is  that  each  posting  in  a  blog  provides  evi¬ 
dence  regarding  the  relevancy  of  that  blog  to  a  specific  topic. 
Blogs  with  more  (positive)  evidence  are  more  likely  to  be  rel¬ 
evant.  Moreover,  each  posting  has  many  different  features 
like  content,  in-links,  and  anchor  text  that  can  be  used  to 
estimate  relevancy.  There  are  also  global  features  of  each 
blog  like  the  total  number  of  postings,  the  number  of  post¬ 
ings  that  are  relevant  to  the  topic  and  the  cohesiveness  of 
the  blog  that  could  be  useful  to  consider.  The  next  sections 
are  organized  as  follows.  First  we  describe  our  approach. 
Then  we  show  our  experimental  results  and  after  we  discuss 
about  conclusion  and  future  works. 


5.  APPROACH 

Our  first  approach  is  to  create  a  baseline  for  blog  distil¬ 
lation  system  that  uses  only  the  content  of  blog  posts  as  a 
source  of  evidence.  To  do  this,  we  consider  the  expert  search 
idea  proposed  in  [8] .  The  main  idea  of  that  work  is  to  treat 
blogs  as  experts  and  feed  distillation  as  expert  search.  In 
the  expert  search  task,  systems  are  asked  to  rank  candidate 
experts  with  respect  to  their  predicted  expertise  about  a 


query,  using  documentary  evidence  of  expertise  found  in  the 
collection.  So  the  idea  is  that  the  blog  distillation  task  can 
be  seen  as  a  voting  process:  A  blogger  with  an  interest  in  a 
topic  would  send  a  post  regularly  about  the  topic,  and  these 
blog  posts  would  be  retrieved  in  response  to  the  query.  Each 
time  a  blog  post  is  retrieved,  it  can  be  seen  as  a  vote  for  that 
blog  as  being  relevant  to  (an  expert  in)  the  topic  area. 

We  use  the  voting  model  to  find  relevant  blogs.  The  model 
ranks  blogs  by  considering  the  sum  of  the  exponential  of  the 
relevance  scores  of  the  postings  associated  with  each  blog. 
The  model  is  one  of  the  data  fusion  models  which  Macdonald 
and  Ounis  used  in  their  expert  search  system[ll]. 
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(5) 


where  R(Q)  is  the  set  of  retrieved  postings  for  the  query  Q, 
and  |B|  is  the  set  of  posts  for  blog  B . 

For  our  second  approach,  we  investigated=  using  more 
features  to  represent  each  blog  beside  its  content.  To  take 
the  different  features  into  account,  we  use  a  Rank  Learning 
[18]  approach  to  combine  the  features  into  a  single  retrieval 
function.  Features  that  we  thought  could  be  useful  were: 


•  Cohesiveness  of  blog  postings 

•  Number  of  postings 

•  Number  of  relevant  postings  (posts  in  top  N  relevance 
results) 

•  Number  of  inlinks 

•  Relevance  of  inlink  post  content 

•  Relevance  of  inlink  anchor-text 


Table  5  shows  features  that  were  used  to  learn  a  re¬ 
trieval  function.  In  the  formulas,  |B|  denotes  the  num¬ 
ber  of  posts  in  the  blog  B,  inlinks-source(B)  denotes  the 
set  of  postings  that  contain  a  link  to  (a  post  in)  blog  B, 
inlink s-anchor-text(B )  is  the  set  of  anchor-texts  for  those 
links,  and  KL(p\\B)  is  the  Kullback-Leibler  divergence  be¬ 
tween  a  posting  p  and  its  blog  B: 

Dkl(p\\B)  =  wp(t)  log  (6) 

teP  B[> 

where  wp(t )  and  WB(t)  are  the  relative  frequency  of  the  the 
term  t  in  the  post  and  blog  (as  a  whole)  respectively. 


6.  EXPERIMENTAL  RESULTS 

For  evaluating  our  methods  we  used  the  TREC  Blogs06 
test  collection,  which  is  a  crawl  of  100k  blogs  over  an  11- 
week  period  [10].  This  dataset  includes  the  blog  postings 
(permalinks),  feeds  and  homepages  for  each  blog.  In  our 
experiments  we  use  only  the  permalinks  component  of  the 
collection,  which  consists  of  approximately  3.2  million  doc¬ 
uments  [8].  We  used  the  Terrier  Information  Retrieval  sys- 
tem3  to  index  the  collection. 

To  find  relevant  posts  for  each  topic,  the  DFR_BM25 
weighting  model  was  used  to  compute  a  score  for  each  blog 
post  [14],  We  chose  this  weighting  model  based  on  its  per¬ 
formance  on  TREC’06  and  ’07  blog  post  opinion  retrieval 

3http : //ir . dcs . gla. ac .uk/terrier/ 


Feature  Name 

Description 

Blog  Relevancy 

EpSfl(Q)ns  exp(score(p,  Q)) 

Cohesiveness 

J2PeBKL(p\\B)  +  \B\ 

Normalized  Blog  Relevancy 

Blog  Relevancy  4-  |.B| 

Fraction  of  Relevant  Postings 

\B  n  R(Q)|  4-  |.B| 

Inlink  Content  Relevancy 

T,deiniinka-source(B)  exp(score(d,  Q))  4-  \inlinks(B)\ 

Inlink  Anchor- Text  Relevancy 

T,aeiniinka-anchor-text(B )  exp(score(a,  Q))  4-  \inlinks(B)\ 

Table  4:  Selected  features  for  learning  phase  of  blog  distillation  approach. 
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Figure  1:  Avgerage  Precision  for  each  Query  (or¬ 
dered  by  relative  performance)  for  the  blog  distil¬ 
lation  task  using  a  simple  voting  model.  The  best, 
median  and  worst  scores  are  those  of  the  other  par¬ 
ticipants  in  TREC’08. 


baselines.  For  each  query  we  selected  the  20,000  highest 
scoring  postings  as  R(Q)  and  use  the  voting  model  (For¬ 
mula  5)  to  combine  post  relevance  scores  into  blog  relevance 
scores.  Figure  1  shows  the  performance  of  our  baseline  for 
each  query  compared  to  best,  worst  and  median  precision 
for  that  query  among  all  groups  in  TREC’08,  where  queries 
are  sorted  based  on  our  baseline  precision4.  We  saw  that 
our  baseline  has  reasonable  performance  across  the  queries 
and  outperforms  the  median  for  most  of  the  queries. 

For  the  second  experiment  we  extracted  the  features  in 
Table  5  and  used  a  rank  learning  system  (SVM-map5)  to 
learn  a  ranking  function  over  them  [18].  For  the  learning 
phase  we  trained  the  rank  learner  on  the  45  topics  and  their 
corresponding  relevance  assessments  used  in  the  TREC’07 
blog  distillation  task.  We  then  tested  the  model  on  the  50 
topics  used  for  TREC’08. 

Figure  2  shows  the  precision-recall  curve  for  the  trained 
ranking  model  compared  to  the  baseline  expert  search  ap¬ 
proach.  We  see  that  using  other  features  beside  content 
doesn’t  help  in  retrieving  better  blogs.  These  results  are  con¬ 
sistent  with  results  in  [8],  where  the  use  of  anchor  text  and 
cohesiveness  didn’t  appear  to  improve  performance.  The 
reason  is  possibly  due  to  the  data  set  itself.  Only  3%  of 

4Jonathan  Elsas  suggested  this  representation  in  his  weblog, 
http : //windowof f ice . tumblr . com/ 

’http : / /projects . yisongyue . com/ svmmap/ 
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Figure  2:  Precision- Recall  for  Baseline  and  Rank 
learning  model 


postings  have  a  link  to  another  posting  inside  the  data  set 
and  even  in  this  small  percent  of  links  there  are  some  noisy 
and  non-informative  links.  Thus  we  will  need  to  look  for  a 
better  way  to  use  this  information.  Furthermore  the  small 
amount  of  training  data  available  -  only  45  queries  -  may 
have  affected  the  ability  of  the  rank  learner  to  learn  a  model 
that  generalized  well  to  the  test  data. 

7.  CONCLUSIONS  AND  FUTURE  WORK 

In  the  Opinion  Retrieval  task  in  TREC’08  we  tried  learn¬ 
ing  approach  for  generating  opinion  scores  for  individual 
documents  and  also  for  learning  a  ranking  function  that 
combines  opinion  evidence  with  relevancy  into  a  single  rank¬ 
ing  function.  A  distinct  advantage  of  our  approach  is  that 
by  performing  multiple  levels  of  learning,  we  maximize  the 
use  of  available  training  data  while  not  relying  on  external 
sources  of  opinion-bearing  word  lists,  that  may  actually  not 
be  well-suited  to  the  blog  opinion  retrieval  task.  In  future  we 
plan  to  extend  the  basic  framework  and  consider  the  proxim¬ 
ity  of  query  terms  to  opinionated  terms  in  assigning  opinion 
weights  to  documents.  We  also  plan  to  expand  the  feature 
space  and  capture  more  complicated  opinion  expressions  by 
using  bigrams  or  trigrams. 

In  the  Distilation  task  we  have  implemented  a  baseline 
that  has  reasonable  results  compared  with  other  systems. 
But  we  have  learned  that  Blog  Distillation  performance  does 
not  seem  to  improve  when  using  the  information  contained 
in  links  between  posts  in  a  blog  data  set.  We  conjecture  that 
this  is  because  some  links  are  not  useful  and  we  should  find  a 


way  to  use  this  link  information  only  when  probability  of  its 
informativeness  is  high.  To  do  this,  we  plan  to  focus  only  on 
links  to  blogs  that  are  related  to  the  topic  and  analyse  their 
structure.  Since  the  most  important  blogs  about  a  topic 
(those  that  contain  the  most  information)  will  likely  also  be 
the  most  influential,  other  related  blogs  would  link  to  them. 
So  we  can  try  to  measure  the  influence  of  a  blog  in  terms  of 
the  number  of  inlinks  from  blogs  containing  similar  content. 
In  this  case  similarity  between  the  source  and  destination  of 
a  link  will  be  taken  into  account  in  addition  to  the  similarity 
between  blogs  and  topics. 
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